Improved Word-Level Alignment: Injecting Knowledge about MT Divergences
نویسندگان
چکیده
Under consideration for other conferences (specify)? none Abstract Word-level alignments of bilingual text (bitexts) are not only an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction, and part-of-speech tagging. The frequent occurrence of divergences, structural diierences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identiied. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages; nally, we present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that divergence-handling can improve word-level alignment. Abstract Word-level alignments of bilingual text (bitexts) are not only an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction, and part-of-speech tagging. The frequent occurrence of divergences, structural diierences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identiied. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages ; nally, we present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that divergence-handling can improve word-level alignment .
منابع مشابه
Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT
The continuously growing MT market faces the challenge of translating new languages, diverse genres, and different domains using a variety of available linguistic resources. As such, MT system adaptability has become a sought-after necessity. An adaptable statistical or Hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches p...
متن کاملImproving Bitext Word Alignments via Syntax-based Reordering of English
We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce s...
متن کاملTitle of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT
Title of dissertation: COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT Necip Fazıl Ayan, Doctor of Philosophy, 2005 Dissertation directed by: Professor Bonnie J. Dorr Department of Computer Science Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success ...
متن کاملDUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment
The frequent occurrence of divergences|structural diier-ences between languages|presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate al...
متن کاملDivergence Unraveling for Word Alignment of Parallel Corpora
We describe the use of parallel text for divergence unraveling in word-level alignment. DUSTer (Divergence Unraveling for Statistical Translation) is a system that combines linguistic and statistical knowledge to resolve structural differences between languages, i.e., translation divergences, during the process of alignment. Our immediate goal is to induce word-level alignments that are more ac...
متن کامل